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Abstract 

Ziggurat and Monty Python are two fast and elegant methods proposed by Marsaglia 
and Tsang to transform uniform random variables to random variables with normal, 
exponential and other common probability distributions. While the proposed methods 
are theoretically correct, we show that there are various design flaws in the uniform 
pseudo random number generators (PRNG's) of their published implementations for 
both the normal and Gamma distributions El El • These flaws lead to non- uniformity 
of the resulting pseudo-random numbers and consequently to noticeable deviations of 
their outputs from the required distributions. In addition, we show that the underly- 
ing uniform PRNG of the published implementation of Matlab's randn, which is also 
based on the Ziggurat method, is not uniformly distributed with correlations between 
consecutive pairs. Also, we show that the simple linear initialization of the registers in 
matlab's randn may lead to non-trivial correlations between output sequences initial- 
ized with different (related or even random unrelated) seeds. These, in turn, may lead 
to erroneous results for stochastic simulations. 



1 Introduction 



Pseudo random number generators (PRNG) are a key component of stochastic simulations. 
Most PRNG's produce sequences of (seemingly) uniformly distributed real numbers in the 
interval [0, 1), typically by quantizing a sequence of integers in the set Qk = {0,1, 2'=-!} via 
division by 2^. Random variables with normal (Gaussian), Gamma or other distributions, 
are then typically constructed via transformations of uniform random variables |31 E] . 
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In jH] , Marsaglia and Tsang proposed the Ziggurat method to generate random variables 
with a decreasing or symmetric unimodal density from uniform random variables. While in 
the original paper |B] the authors assumed the existence of a suitable PRNG that outputs 
uniform random numbers in the range [0, 1), in PP and [2], the same authors refined their 
original method and suggested a specific underlying fast uniform PRNG along with computer 
code to produce normal or Gamma distributed numbers. In a different work the same 
authors proposed the Monty Python method to generate random variables with normal, 
exponential or other common distributions, and proposed a specific implementation based 
on a multiply-with-carry (MWC) uniform PRNG. In this paper we show that there are 
various flaws in the design of these underlying PRNG's, which lead to significant deviations 
of their outputs from uniformity, and consequently poor distributions of the resulting normal 
or Gamma distributions. 

We note that statistical problems with the implementation of were recently noted by 
Leong & al .7\. These authors found that the resulting sequences of normally distributed 
numbers fail the simple test, and attributed this finding to the (relatively) short period 
of the underlying PRNG (2^^ — 1). Our analysis elucidates the design flaws leading to these 
statistical problems, which are mainly due to non-uniformity and correlations of the outputs 
of the PRNG. While the short period of the specific suggestion of PQ|2], only magnifies these 
problems, we show that other PRNG's with much longer periods but same output function 
lead to the same statistical problems. 

Another normal random number generator based on the Ziggurat method is Matlab's 
built-in function randn. We analyze the underlying uniform PRNG of this function, based 
on the matlab code published in [Hj. We show that while individual outputs of this PRNG 
are uniformly distributed, pairs of consecutive outputs are correlated. Since the output of 
the Ziggurat method is highly non-linear in its input, it is difficult to detect these corre- 
lations in the resulting normal random numbers. However, we show that initializations of 
the function randn with different, either related or even random unrelated seeds, as done in 
parallel implementations and other stochastic simulations, may lead to non-trivial correla- 
tions between the resulting output sequences of random numbers. We give a simple example 
where a sequence of such initializations yields incorrect results for a stochastic simulation. 

The paper is organized as follows. In section 2 we present a probabilistic setting in which 
we can properly define statistical properties of sequences of random numbers from PRNG's. 
The design flaws and statistical weaknesses of the PRNG's published in |2] are analyzed 
in section 3. In Section 4 we analyze the weaknesses in the multiply with carry generator 
and their consequences on the Monty Python method of |3]. The analysis of Matlab's randn 
is presented in Section 5. We conclude in Section 6 with a summary and discussion. 

2 Randomness and statistical requirements from a PRNG 

The main goal of a uniform PRNG on a set of possible outputs Q is to produce long sequences 
of numbers which imitate, in a statistical sense, realizations of a sequence of corresponding 
independently identically distributed (i.i.d.) uniform random variables on the same set. 
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While sequences of random variables are a well defined concept within the framework of 
probability theory, the notion of "randomness" in a sequence of numbers produced by a 
deterministic algorithm is problematic and requires proper definition. 

Following ^9j, we consider a uniform random number generator as a finite state machine 
{F, g, s, P) with an internal state set S and output set Q. The current state of the PRNG is 
denoted hj s & S, F : S ^ S is the transition function, g : S ^ fl is the output function, 
and P : T ^ S is the initialization function, where T is a set of possible seeds. The internal 
state is updated according to Sj+i = F{si) while the output at step i is g{si). The state 
machine is initialized by a seed, such that Sq = P{seed). We denote by u{sQ,j) the j-th 
output of a generator with initial state Sq. Note that S and Q need not (and in general 
should not) be of the same size. For example, the internal state could be of size \S\ = 2^^^ 
(e.g. 128 bits), while the output set Q could be of size 2^^. 

Since the PRNG is deterministic, the output sequence is uniquely determined by the 
initial seed and the functions F,g,P. Moreover, since F is a finite state machine, it is 
obviously periodic with some period Z < \S\. We introduce a probability space in this setting 
by considering both the time of observation and the initial state sq as random variables, with 
the initial state uniformly distributed over the set S and the time of observation uniformly 
distributed over the integer set {1, 2 . . . I}. We denote by {Ui, U2, ■ ■ ■} the resulting sequence 
of random variables. 

This sequence should, by definition, have the same probability distribution as that of 
a sequence of i.i.d. uniform random variables {Xi,X2, . . .} over the set Vt. Obviously, the 
sequence {f/j} does not have the same distribution as that of {Xj}, since f/j+/ = f/j, for 
example. Therefore, one of the basic requirements from a PRNG is to have a very long 
period 1^1, significantly longer than the total number of outputs used by the simulation. 
In addition, inside the long period we require that at least the low order statistics of {Ui} 
and {Xi} coincide. Specifically, we consider a PRNG {F,g,s,P) as statistically sound if it 
satisfies (at least) the following requirements (see also PH]): 

1. Uniformity of first order statistics on Q: We require that 

1 ' 

' ' soe5 t=i 

where S{i,j) is the Kroneker delta function, equal to one if i = 

2. Uniformity of second order statistics on Q: We require that 

1 ' 

Pr{(t/i, U2) = (wi, «2)} ^ J] 5^ Siuiso, t),u,)5iuiso, t + 1), ^2) = (2) 

' ' soes t=i ' ' 

Note that combining requirements (0) and (j21) implies that the conditional distribution 
of U2 given Ui is also uniform on the set Q. In other words, observation of a single 
output does not affect the distribution of the next output. 



j and zero otherwise. 
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3. Insensitivity to initialization with related seeds: Recall that the initial state is set 
according to Sq = P{seed). Let U,U' be the random variables that correspond to 
initializations with seeds that differ by A. We require that for any A G T \ {0}, 

Pt{U = u,U' = u'} = ^ 5(u(P(seed),j),M)5(M(^(seec/ + A),j),^^') = 1^ (3) 

seed^T 

For many PRNG's, the set of requirements does not hold exactly. We still consider 

a PRNG as statistically acceptable if the discrepancy between these distributions and the 
corresponding uniform ones is extremely small, say below a threshold e, such that detection 
of this discrepancy would require more than 2^°° outputs, for example. 

Obviously, PRNG's need to satisfy many more requirements to be considered acceptable 
from a statistical point of view, and their output sequences are typically required to pass 
various empirical statistical tests (see, for example [H (TUl US CHI and references therein). 
However, as we shall see below, each one of the requirements Q-© is essential in the context 
of both the Ziggurat and the Monty Python methods, and possibly so in the more general 
context of stochastic simulations. While requirements ([Q) and (j21) seem obvious, requirement 
(j21) can be quite important when different runs are made with different seeds, as in parallel 
implementations of stochastic simulations. 

3 Design Flaws in the uniform RNG of PO, 121 

For the paper to be reasonably self contained, we briefly describe the basic Ziggurat method. 
To generate a non- negative random variable with a monotonically decreasing density f{x) 
from a uniform r.v. U[0, 1], we choose k points = Xq < Xi < X2 < . . . < Xk-i, such that 

- f{xi)) = + Pr{x > Xk-i} 1 <i <k-l 

Given the set {xi}^!^, the Ziggurat method works as follows: 

1. Choose an index < i < A; at random with uniform probability 1/k. 

2. Draw a random number u from the uniform distribution U[0, 1], and let x = uXi. If 
i>l and x < return x. 

3. If i > 1 draw another uniform random number y. If — f{xi)) < f{x) — f{xi) 
return x. 

4. li i = 0, generate an x from the tail x > X127 and return x. 

5. Otherwise, return to step 1. 

The generation of values from the tail of the distribution is described explicitly below. In 
most applications k is chosen to be a power of 2, (typically k = 64,128,256), so that 
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choosing a random index with probabihty is easily done by considering the first few bits 
of a random 32 bit uniformly distributed integer, for example. 

The key point and the beauty of the Ziggurat method is that if the two numbers x and 
y in steps 2 and 3 are indeed (statistically) random, independent and uniformly distributed 
over [0, 1), then the output will be a truly normal distributed random variable. The original 
publication |Q described the method with an unspecified underlying uniform PRNG. How- 
ever, in [H 12] the following code for steps 1-3 was suggested by Marsaglia and Tsang, using 
k = 128, 

unsigned long jsr,jz; 
long hz,iz; 

#define SHR3 (jz=jsr, jsr"=(jsr«13) , jsr"=(jsr>>17) , jsr"=(jsr<<5) , jz+jsr) 
#define RNOR (hz=SHR3, iz=hz&127, (f abs(hz)<kn[iz] )? hz*wn[iz] : nfixO) 
#define UNI (.5 + (signed) SHR3* . 2328306e-9) 

float nfixO 
{ float x,y; 
for(; ;){ 

x = hz * wn[iz] ; 

if(iz==0)-[ // generate an output from the tail 
do{ x=-log(UNl)/x[k-l] ; y=-log(UNI) ; } 
while (y+y<x*x) ; 
return (hz>0)? r+x : -r-x; 

} 

if (fn[iz]+UNI*(fn[iz-l]-fn[iz]) < exp(-.5*x*x) ) return x; 
hz=SHR3; iz=hz&127; if (f abs(hz)<kn[iz] ) return hz*wn[iz] ; 

} 

} 

First, a few explanatory words on the code above: The inline code RNOR produces a normal 
random number. It first calls SHR3, which both updates the 32-bit register jsr and outputs a 
32-bit integer, which should be uniformly distributed in the set . . . 2'^^ — 1. To produce both 
the positive and negative parts of the normal distribution, the output of SHR3 is assigned to 
the (signed) variable hz of type long. The two tables kn and wn are initialized to store the 
following values: kn[i] = 2~^^Xi-i/xi and wn[i] = Xj/2^^ for i > 1 and special values for 
i = 0. The procedure nf ix() takes care of steps 3-5, whenever step 2 fails. 

The register jsr is updated via a linear transformation made up of three shifts, hence 
the name SHR3 (shift register 3). For future use we denote this linear transformation by T, 
so that jsr'^*"'"^) = T(jsr^*^). According to ^H] this transformation has maximal period, e.g. 
2^^ — 1. Note, however, that the output of SHR3 is jsr+T(jsr) (mod 2^^), and not the value 
of j sr itself. We will come back to this point later on. 

In our analysis we will need estimates on the number of outputs required to distinguish 
between a discrete distribution over k values with probabilities {pi,---,Pk) and one with 
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probabilities (gi, . . . , g^.) with = + Si). As shown in the appendix, for a distinguisher 
based on the statistic, the number of required outputs is of the order of 



N 



2k 



0{l/e') 



(4) 



Design Flaw # 1 : The first problem we observe, as also recently noted by Doornik 
is that the same 7 least significant bits of hz are used both for choosing the random index 
< z < 127 in step 1, and for the uniform random number u in step 2 of the algorithm. 
This obviously introduces some statistical dependencies into the algorithm in two different 
locations: The first is that the random numbers from the i-th index all end with the same 
7 least significant bits, and the other is in the computation of the rejection probabilities of 
step 2. Let us roughly estimate this second deviation and its shortcomings: For a 32 bit 
uniform random number and a table with k = 128 (e.g. 7 bits), the fact that the last 7 bits 
are fixed induces errors in the rejection question at step 2 (whether uxi < Xi-i) of the order 
of e = l/2^^~^ = 2~^^ (instead of the quantization error of 2~^^). Therefore, according to 
(jD), to detect such a deviation one would need 0{l/e^) = 2^° outputs. While this design 
flaw certainly produces a deviation from the normal distribution, it is quite negligible as 
compared to the next design flaw that we now describe. 

Design Flaw # 2 : The output of SHR3, of the form x+Tx is highly non-uniform and fails 
to satisfy the basic requirement ((T]). Due to the specific structure of T, the output x+Tx 
is not one-to-one, but rather a contractive mapping with 1543756180 outputs (about 2^^'^) 
not possible at all. Thus the output range of SHR3 is restricted to only about 64% of the 
possible 2'^^ outcomes, with some values 10 times more probable than expected in a uniform 
distribution. Table 1 shows the distribution of the number of sources of a possible value y, 
e.g. the number of values x such that x + Tx = y. 

This non-uniform distribution of SHR3 yields non-negligible deviations from normality 
for the resulting normal random numbers. Due to the structure of T one can prove that the 
lowest seven bits of x+Tx are uniformly distributed. Therefore, the probability of choosing 
a specific index i in step 1 of the algorithm is still 1/128 as should be. However, the non- 
uniformity of the output yields non-negligible deviations in the resulting variables uXi and in 
the expected rejection probabilities at step 2 of the algorithm. We now describe the effects 
of these deviations and estimate the number of outputs needed to detect them in a simple 

test on the resulting normal numbers. 

Let Z denote a standard Gaussian variable with zero mean and unit variance. For 
j = 1, . . . , A; — 1 we define pj = Pr{a;j_i < |Z| < Xj} and pk = Pt{\Z\ > Xk-i}. Let qj denote 
the corresponding probabilities in the Ziggurat algorithm, whose underlying PRNG is SHR3. 
Then, by definition 

k-l 

Qj = Pr{xj_i < < Xj} = ^^Pr{index chosen is i}FT{xj-i < \x\ < Xj\ index i} (5) 

i=0 
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# sources 


7^ of outputs 





1543756180 


1 


1616832933 


2 


808153149 


3 


256471123 


4 


58117590 


5 


10068341 


6 


1391608 


7 


159565 


8 


15358 


9 


1334 


10 


109 


11 


5 


12 


1 


13 






Table 1: Distribution of the number of sources of the transformation x + Tx. 

As discussed above, the 7 least significant bits of SHR3 are uniformly distributed, and there- 
fore Pr{index chosen is z} = 1/k. However, the probabilities Pr{a;j_i < |a;| < Xjlindex i}, 
deviate from the theoretically expected ones. To estimate qj we performed the following 
calculation: We passed over all 2^^ — 1 possible initial values for the register j sr, computed 
the first output of RNOR, and created a histogram of hits into the 128 bins [a;j_i,Xj] and 
[xi27, oo). In table 121 we present the eight bins with the largest deviations (measured as Piej, 
where qi = pi{l + Applying formula (jH), we estimate that after an order of 2^° outputs, 
these deviations from the normal distribution can be detected with a test on these 128 
bins. 

This result is not due to the relatively short period of the register j sr. In figure Q we 
present numerical results of the test done on 200 bins, evenly spaced in the interval 
[—7.0,7.0] as done by Leong et al [7], for two other underlying PRNG's with much longer 
periods, but whose output is computed via x+Tx. The two generators are either a combination 
of SHR with CONG, a multiplicative congruential generator, which is the underlying generator 
of matlab's randn, and the KISS generator [7j, which combines also a multiply with carry 
register. We stress that in both cases, the output is x+Tx instead of the original x, and as 
expected the statistics starts to significantly deviate from its expected mean after roughly 
2^^ outputs. 

3.1 A quick fix ? Wrong Tail Probabilities ! 

Since x+Tx is non-uniform, a possible and natural " quick fix" is to replace the output by the 
state of the 32-bit register jsr, via the following inline code SHRO, 

#define SHRO (jsr~=jsr«13, jsr~= jsr»17, jsr~=jsr«5, jsr) 
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Interval i 




Pi 


Qi 


^i = ili -Pi)/Pi 


Pi^l 


103 


[2.1443, 2.1659] 


0.0016945 


0.0016955 


0.00060306 


6.1625e-010 


82 


[1.7748, 1.7900] 


0.0024789 


0.0024778 


-0.00041684 


4.3072e-010 


109 


[2.2843, 2.3104] 


0.0014902 


0.0014894 


-0.00051947 


4.0212e-010 


92 


[1.9353, 1.9525] 


0.0020838 


0.0020829 


-0.00043891 


4.0143e-010 


108 


[2.2591, 2.2843] 


0.0015242 


0.001525 


0.00047076 


3.3779e-010 


16 


[0.7981, 0.8189] 


0.0119470 


0.011945 


-0.00015320 


2.8041e-010 


104 


[2.1659, 2.1882] 


0.0016603 


0.001661 


0.00040957 


2.7852e-010 


112 


[2.3659, 2.3954] 


0.001387 


0.0013864 


-0.00043609 


2.6378e-010 



Table 2: The eight intervals with the largest relative discrepancies from the normal distri- 
bution, measured as Pisf. 




# samples x 10^" 

Figure 1: values vs. number of samples for generators with output x+Tx. 

Since jsr is a maximal length 32-bit shift register, when averaged over its period, its state 
is a uniformly distributed 32-bit integer number in the range 1 . . . (2^^ — 1), and thus ap- 
proximately satisfies requirement (Q). One may thus be tempted to conclude that replacing 
SHR3 with SHRO fixes all statistical problems, and the resulting sequence of normal random 
numbers from the Ziggurat algorithm should easily pass the statistical test of [7j. 

However, the Ziggurat method with this underlying PRNG also fails the test. The 
reason is that even though the outputs now satisfy requirement (jT]), they fail to satisfy 
requirement Q, and this leads to non- negligible deviations in the tail probabilities (when 
\x\ > 2:127) • As seen from the code RNOR and the function nfixO, a number from the 
tail is produced only when the seven least significant bits of jsr are all zero and in addi- 
tion, the resulting number satisfies the condition fabs(hz)< kn[0]. By enumeration over 
all 2^^ — 1 possible values of jsr with 7 least significant bits all zero, only 2,444,151 val- 
ues pass this test. For each of these numbers we calculated the resulting normal num- 
ber and produced a histogram according to the following eight intervals defined by X* = 
{xi27, 3.75, 4.0, 4.25, 4.5, 4.75, 5.0, 5.5}, where 0:127 = 3.44262. In table El we show the result- 
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Interval 


Pr{s* < \x\ < 


Ft{x\ < \Z\ < a;*+i} 


1 


6.9298e-l 


6.9305e-l 


2 


1.9705e-l 


1.9700e-l 


3 


7.2872e-2 


7.2843e-2 


4 


2.5334e-2 


2.5311e-2 


5 


8.2683e-3 


8.2644e-3 


6 


2.5105e-3 


2.5357e-3 


7 


9.3857e-4 


9.2921e-4 


8 


5.4825e-5 


6.5924e-5 



Table 3: Tail probabilities from the Ziggurat method vs. the correct probabilities from the 
normal distribution. 

ing conditional probabilities Pr{Xj < |x| < Xj^-^ \x\ > ^127} as computed from the Ziggurat 

method with SHRO, vs. the correct probabilities from the normal distribution. Applying ^ 
with A; = 8 we obtain that these deviations can be detected after observation of 0(2^^) 
samples with \x\ > xui or roughly = 0{2^^-^) outputs overall. 



4 Design Flaws in the MWC generator of 

In contrast to the Ziggurat method, which requires pre-computation and storage of tables of 
size k, the Monty Python method can produce random variables with normal and other 
common distributions without the aid of auxiliary tables. In [3], the authors presented the 
method and suggested the following multiply-with-carry (MWC) generator as a source of 
uniformly distributed numbers in the set 1^32, 

unsigned long jsr_z, jsr_w; 

#define ZNEW (jsr_z=36969*(jsr_z&65535) + (jsr_z»16)) 
#define WNEW (jsr_w=18000*(jsr_w&65535) + (jsr_w»16)) 
#define MWC ((ZNEW«16) + (WNEW&65535) ) 

In this specific suggestion, each call to MWC outputs a 32-bit integer, based on two independent 
32-bit registers j sr_z and j sr_w . Some of the properties of the transition function of these 
registers, which is of the form 

(c, w) = ((c + aw)div b, (c + aw)mod h) (6) 

where c is the carry and w is the residual upon division by h have been analyzed by Marsaglia 
[T^ . Couture and L'Ecuyer jT^j- For generalizations to recursions with a similar form, see 
also Goresky and Klapper^lj. The main properties depend on the number m = ah — 1. As 
proven in p^, when m is a safe prime (both m and (m — l)/2 are primes), the period of any 
starting value (c, w), other than the trivial fixed points (0, 0) and (a — 1, 6 — 1) is (m — l)/2. 
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7 MSB's i 


Pr{(z_new >> 9) = i} 


Si 


22 


0.007818141 


7.2200e-3 


107 


0.007806859 


-7.2199e-3 


50 


0.007817052 


5.8263e-3 


79 


0.007807948 


-5.8262e-3 


11 


0.007816705 


5.3825e-3 


118 


0.007808295 


-5.3825e-3 


115 


0.007816347 


4.9250e-3 


14 


0.007808652 


-4.9249e-3 



Table 4: Values of 7 MSB's with largest deviations from uniformity for one orbit of jsr_z. 

Since there are m + 1 = ab possible values for (c, w), it follows that there are two disjoint 
orbits with this period. For the specific values of MWC above, the period of each non-trivial 
orbit of jsr_w is 589823999, while that of jsr_z is 1211400191. Since both periods are prime 
numbers the combined period of MWC is their product, or roughly 2^^'^. 

Despite the long period, the main design flaw in MWC is the non-uniformity of its output. 
Consider for example the 16 most significant bits of MWC, e.g., the lowest 16 bits of the 
register j sr_z. When we consider all possible initializations (c, w), these 16 bits are obviously 
randomly distributed. However, within each disjoint orbit, these 16 bits are non-uniformly 
distributed. To illustrate this, we computed the exact probabilities for the seven most 
significant bits in a given orbit. In a uniform distribution, the probability to obtain each 
one of the 128 possible outcomes should be Pi = 1/128 = 0.0078125. However, as shown 
in table |31 some outcomes have non-negligible deviations from this value. Similarly, the 16 
output bits of jsr_w are also not uniformly distributed within each orbit. Since the union 
of the two disjoint orbits covers almost all possible ab values for {c,w), if deviations in one 
orbit are of the form qi = pi{l then in the other orbit the corresponding deviations are 

approximately q'^ = Pi{l — Si). Therefore, a union of output sequences belonging to disjoint 
orbits is approximately uniformly distributed. This might explain why this generator was 
reported to pass the DIEHARD tests of Marsaglia. However, within each orbit, this deviation 
from uniformity is easily detected with simple statistical tests. Using formula Q, we obtain 
that after 2^^ outputs, it can be detected by a test with 128 bins. We remark that on 
modern computers, 2^^ outputs are typically generated in less than 10 seconds. 

To conclude, while the period of MWC is larger than 2^^, its 32-bit output fails the basic 
requirement (^. These non-negligible deviations from the uniform distribution also lead 
to non-negligible deviations from the normal distribution when applying the Monti Python 
method. These can be easily detected by a test on 16 bins after 2^° outputs, and with 
even fewer outputs if the number of bins is increased. 

We note that the MWC generator is also not suitable for use with the Ziggurat method. 
The reason is that within each orbit, the 7 Isb's of jsr_w are not uniformly distributed. 
Therefore this PRNG would not choose each of the 128 intervals at the required uniform 
probability of 1/128. Indeed numerical experiments show that the statistics on outputs 
of the Ziggurat method based on this PRNG start to significantly deviate from the expected 
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value after about 2^^ outputs. 

Finally we note that MWC is one of the standard uniform random number generators in the 
statistical software R. Given the above non-uniformity of this generator, we caution against 
its use in simulations. 

5 A statistical analysis of Mat lab's randn 
5.1 The underlying uniform PRNG of randn 

The Matlab software has a built-in function randn to produce normally distributed random 
numbers, which is also based on the Ziggurat method. A matlab code compatible with the 
pre-compiled built-in function appears in |l8]|. In contrast to [T], Matlab's randn is based 
on a combination of two different 32-bit registers, jsr and icng. Here is a pseudo-C code 
corresponding to matlab's randn: 

unsigned long jsr, icng; 
long hZjiz; 

#define SHRO (jsr"=jsr<<13, jsr~= jsr>>17, jsr~=jsr«5, jsr) 
#define CNG (icng = 69069*icng+1234567) 

#define RNOR (hz=CNG+SHR0,iz=hz&63, (fabs(hz)<kn[iz] )?hz*wn[iz] :matlab_nf ix() ) 

The first register jsr is updated by SHRO, as a linear shift register with maximal period 
of 2^^ — 1. The second register icng is updated as a multiplicative congruential generator, 
with maximal period 2^^. The output which serves as a uniform random number is their 
sum (jsr + icng) mod 2'^^. We denote the transition function of jsr by T, and that of 
icng by R. We also denote its multiplicative part by Rq, e.g. Ro(x)=69069*x mod 2^^. Since 
the periods of the two registers are relatively prime, the combined period of randn is their 
product, a number close to 2^"^. Matlab uses a table of size 64, and since it is based on 
the original Ziggurat publication [Hj, both the points Xj, the tables kn,wn and the function 
matlab_nf ix() are different from the ones described in section 3. 

Since icng is uniformly distributed over ^32, individual outputs jsr+icng also also uni- 
form over f232 and satisfy requirement (|T)). However, pairs of consecutive outputs are highly 
correlated, and fail requirement (j21). Let yi and 1/2 denote two consecutive outputs of this 
uniform random number generator. Let a, b denote the unknown initial states at time 1 
of the two registers jsr and icng, that is {a + b) = yi. After a single update of the two 
registers, the next output is given by y2 = T{a) + R{b). However, since b = yi — a, we have 
that y2 = T{a) — Roia) + R{yi)- Similar to the analysis of section 3, the transformation 
T(a) — Ro{a) is highly contractive and not one-to-one. Therefore, the pair of outputs {yi,y2) 
is not uniformly distributed over x Q^2 and thus fails to satisfy the requirement (j21). 
Table shows the distribution of T{a) — Ro{a). As shown in the table, some 2^°'^ values 
are not possible, while other values are 10 times more probable than expected in a uniform 
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^ sources 


# of outputs 





1590591029 


1 


1569484236 


2 


784774346 


3 


265026908 


4 


68022535 


5 


14147755 


6 


2484729 


7 


377496 


8 


51341 


9 


6136 


10 


713 


11 


65 


12 


6 


13 


1 



Table 5: Distribution of the number of sources of Tx — RqX. 

distribution. Note that (1/2 — R{yij) mod 2^^ is therefore highly non-smooth, and would not 
pass a test for uniformity. 

We now consider the implications of these findings on the resulting normal numbers as 
computed by randn. Consider, for example, the rejection probabilities at step 3. Since y 
depends on x, the rejection probabilities deviate slighly from the correct ones. However, 
when computing these rejection probabilities over large enough intervals, these x-dependent 
deviations almost cancel out (they are positive for some x and negative for others). Similarly, 
tail probabilities at individual x-values also deviate from their correct values, but when 
averaged over large enough intervals these deviations cancel out. Therefore, even though 
the underlying generator is not uniform in pairs, its effects on the resulting normal random 
numbers is difficult to detect by standard tests. 

5.2 Initialization Issues 

We now consider the initialization of randn and its possible consequences. Matlab provides 
two different initialization options, 

raiidn('state', a); OR randn('state', [a b]'); 

The first sets the initial value of jsr to a, with the initial value of icng set to a fixed value 
362436069. The second option allows to set also the initial value of icng to b. 

In many applications, such as parallel computations and stochastic simulations, there is 
a need to create many independent sequences of normal random numbers. In the case of 
parallel computer systems, it is quite common to initialize the seed of processor number id 
with a seed of the form seedo + id. Quite a few works describe the dangers and possible 
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pitfalls in using sequences of random numbers produced by different initializations of the 
same generator (see [TIH IT^ [T7] and references therein) . We now present a simple example 
of such a pitfall for matlab's randn. 

Suppose we wish to simulate 2^^ different paths of a stochastic system that requires 
normal random numbers on a parallel computer with 256 processors. A possible code can 
be for example 

for i=l:256 

for j=l:256 

send to processor i the following: 
randn (' state [i j]'); 
simulate_random_path() ; 

end 

end 

This code ensures that each simulation thread obtains a different seed. However, consider 
the output sequences resulting from two initializations [i j] and [i j+64]. These two 
initializations have the same initial value for j sr and differ only by the initialization of the 
register icng. Due to the structure of the transition function R of this register, it follows that 
for all subsequent times, both of these sequences will have the same six low significant bits. 
Therefore, neglecting the possible misses in the Ziggurat method, which require a call to 
matlab_nf ix() , both sequences will choose the same indexes (!), and the resulting normal 
numbers will be highly correlated. 

These correlations between output sequences initialized with different but related seeds 
are due to the failure of randn to satisfy requirement Q. However, we might be tempted 
to conclude that if instead we perform initializations with random unrelated seeds the re- 
sulting sequences will be uncorrelated. However, even in such cases there can be non-trivial 
correlations between the first few values of different output sequences. Consider the output 
sequences from n initializations of the form randn( ' state ' ,v[i] ), where {vi\^^i is a set of 
n random non-zero 32 bit integers. Suppose there are four distinct indices, j, /c, /, such that 
Vi ® Vj = Vk Q) vi = a, where © denotes bitwise exclusive or. Then, for the first few outputs, 
the resulting 4-tuples of outputs are not independent. To see this, denote x\x^,x'',x'' the 
resulting first output of the uniform random number generator initialized with Vi,Vj,Vk,vi, 
respectively. Since all these initializations have the same initial value for the register icng, 
the first output is given by 

x' = T{vi) + R{icng) x^ = T{vj) + R{icng) = T{v,) © T{a) + R{icng) 

with similar expressions for x^ and xK As an example, assume that the most significant bit 
of T{a) is zero. Then, with high probability the most significant bit of and x^ (and of 
x*^ and x'') will be the same. Therefore, the resulting normal numbers will have the same 
sign. For the second output the sign will be determined by the most significant bit of T^(a), 
etc. (assuming that no intermediate calls to matlab_nf ix() occurred). Another example 
of dependency occurs if we consider the 7 least significant bits of T(a). For simphcity, if 
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these are all zeros, then the numbers and (and x*^ and x') point to the same index 
for the Ziggurat method. This again leads to correlations between the 4-tuples of normal 
outputs. Needless to say, such correlations between different streams may bias a stochastic 
simulation in unexpected ways. Given n initializations, the average number of such 4-tuples 
is of the order of (^)2~^^. Therefore, after only n = 0(512) random sequences there will be 
on average one such 4-tuple. 

6 Summary and Discussion 

In this paper we presented various statistical weaknesses in the published implementation of 
the Ziggurat and Monty Python methods and in the underlying uniform PRNG of matlab's 
built-in function randn. As also noted in other works, the main take home lessons from 
our analysis are: i) The set S of internal states of the generator should be much larger 
than the output set Q. ii) Correlations between consecutive outputs of a uniform RNG can 
have detrimental effects on the results of a stochastic simulation, iii) The creation of many 
different sequences of random numbers via initializations with different seeds must be done 
with great care. 

Regarding the initialization of random number generators, we note that most implemen- 
tations use P = / or some other relatively simple scheme, in which the seed is typically 
entered into the inner state in a linear fashion. However, since initialization is done only 
rarely it is possible to spend many more CPU cycles on this stage, and make the inner state 
be dependent on the initial seed in a much more complicated and non-linear manner. 

We remark that there is an interesting connection between our analysis of PRNG's and 
cryptanalysis of stream and block cyphers. For example, our analysis of the statistical 
correlations of PRNG's initialized with different but related seeds is similar to the 'related 
key attacks' introduced by Biham in JH]; and used to crack the WEP wireless encryption 
protocol [19] . This serves as yet another justification for making the initialization stage (the 
function P in the notation of section 2) a complicated non-linear function. 

There is yet another connection between PRNG's and cyphers. In cryptography, the 
designer would like the security of a cypher system not be dependent on its specific initial- 
ization by the user (e.g., with say counters or initial values (IV's) increasing by one). We 
submit that a similar requirement should hold in the design of a PRNG. The output of a 
PRNG (and correlations between different output runs) should not be highly dependent on 
simple and natural initializations by the user, who typically does not know nor wishes to 
fully understand the inner workings of the random number generator at his disposal. 

Acknowledgments: The author would like to thank Prof. Adi Shamir for interesting 
discussions. 
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Appendix 



Let X and Y be two discrete random variable over k possible values, with probability dis- 
tributions {pi, . . . ,pk) and (gi, . . . , qk), respectively. Our aim is to estimate the number of 
outputs needed from i.i.d. realizations of Y to check the hypothesis if Y has the same 
probability distribution as X, using the test with k bins. Let {yi, . . . , un) be N random 
samples from the distribution of Y, and let (^i, . . . , Zk) denote the number of occurrences of 
the values {1, . . . ,k) in the sequence {yi}fLi. Then the statistic is given by 



k 



jrt Npi ^ Npi 



Its mean (expected) value is 



ET = y ^ - TV (7) 
If y ~ {qi,...,qk) then each Zi follows a Binomial distribution Bin{N, qi). Therefore, 

Writing q^ = Pi{\ + £i) gives 

k k 

ET = (fc - 1) + (AT - 1) Y^p^e] + Y,e, (9) 

Since a distribution with k degrees of freedom has a variance of 2/c, it follows that to 
distinguish between the distribution of X and Y, the statistic must significantly deviate 
from A; — 1 + \/2k. Thus, we require that 

N^o{^^,\ (10) 
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